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1 Introduction 

Consider the Kohn-Sham (KS) equation 



H(X)X = XA, 

(1) 

X T X = I, 



where X £ M. nxk , the Hamiltonian H(X) e E nx " is a matrix function with respect to X such that H(X)X is equal 
to the gradient of some total energy functional E(X) (to be defined in section|2]), and A 6 M. kxk is a diagonal matrix 
consisting of k smallest eigenvalues of H (X ) . The KS equation is a fundamental nonlinear eigenvalue problem arising 
from the density functional theory (DFT) for electronic structure calculations J9][TT|, in which the charge density of 
electrons is defined as 



P {X) 4 diag(XX T ), (2) 

where diag(^4) denotes the vector containing the diagonal elements of the matrix A. 

The most widely used approach for solving (H) is the self-consistent field (SCF) iteration. Starting from X° with 
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(A°) T A" = I, the SCF iteration computes the (i + l)-th iterate X' l+1 as the solution of the linear eigenvalue problem: 

H(X l )X t+1 =X l+1 X i+ \ 

(3) 

(x t+1 ) T x l+1 = I- 

When the difference between two consecutive Hamiltonians is negligible, the system is said to be self-consistent and 
the SCF procedure is terminated. Heuristics have been proposed to accelerate and stabilize the SCF iteration. For 
example, the charge mixing techniques |6] [8) replace the Hamiltonian by a new matrix constructed from a linear 
combination of either the potential or the charge densities computed in the previous SCF iterations and a new one 
obtained from certain schemes. 

It is well known that the basic version of SCF iteration (01 often converges slowly or fails to converge Q even 
with the help of various heuristics for decades, yet a clear explanation is not available. In |Tll2l [T3l , the authors prove 
that the sequence generated by the SCF iteration converges alternatively to two limit points which do not satisfy ([TJ on 
certain type of problems. Some numerical explanation are provided in lfl5l by viewing the SCF iteration as an indirect 
procedure of minimizing a sequence of quadratic surrogates. A condition is identified in lfl3l to guarantee that the 
SCF iteration becomes a contractive fixed point iteration under a specific form of the Hamiltonian without involving 
any exchange correlation term. Basically, the condition characterizes the contribution of the nonlinear component of 
the Hamiltonian. 

In this paper, we establish some conditions on ensuring global and local convergence of the SCF iteration for 
general Kohn-Sham DFT from an optimization point of view. Actually, the KS equation (j} is closely related to the 
constrained minimization problem with orthogonality constraints 

min E(X) 
s. t. X T X = I. 

The first-order optimality conditions of (|4} are the same as (Q]) except that the diagonal matrix A is consisted of any 
k eigenvalues of H(X) rather than the k smallest ones. Inspired by the expression of the exact Hessian of E(X) 
discovered in P1I121 . we observe that the SCF iteration discards a "complicate" term in the Hessian of the total energy 
functional E(X). Our analysis shows that this term plays an important role in the performance of the SCF scheme ©. 
Briefly speaking, it converges if the gap between the kt\\ and (k+ l)st eigenvalues of the Hamiltonian H(X) outweighs 
the norm of the complicate term in the Hessian up to some constant. Our analysis only requires the assumption that 
the second-order derivative of the exchange correlation energy functional is uniformly bounded from above, which 
implies the Lipschitz continuity of the Jacobian of the functional. 

The rest of this paper is organized as follows. In section[2] we describe the total energy functional and its gradient 
and Hessian, as well as the distance measurements between subspaces in detail. The global and local convergence of 
the SCF iteration are presented in section [3] and |4] respectively. Some relationship to the condition in 0131 is clarified 
in section[5] Finally, we conclude our paper in the last section. 
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2 Problem Statement 



2.1 The KSDFT Total Energy Functional 

Consider the discretized Kohn-Sham (KS) total energy functional defined as 

E(X) := \tr(X T LX) + ^(X T V lon X) + ^ E E \ x i w ^ + \p ' L V + ^^(p), (5) 

% i 

where X = [x\, . . . , Xk] € R nxfc . The first term of (0 is the so-called kinetic energy, where L is a finite dimensional 
representation of the Laplacian operator. The second term denotes local ionic potential energy, where the diagonal 
matrix Vi on is the ionic pseudopotentials sampled on the suitably chosen Cartesian grid. The third term defines the 
nonlocal ionic potential energy, where wi represents a discretized pseudopotential reference projection function. The 
matrix corresponds to the pseudo-inverse of L and the fourth term denotes the Hartree potential energy, which is 
used to model the classical electrostatic average interaction between electrons. The final term denotes the exchange 
correlation energy, which is used to describe the nonclassical interaction between electrons. More detailed description 
of each terms of E(X) can be found in lfl4l[T5l . 

It can be verified that the gradient of E(X) with respect to X is VE{X) = H(X)X, where the Hamiltonian 

H(X) := + V ion + E WlW l + Dia g( Lt /°) + DiagO*x C (/») T e), (6) 
l 

and Hxc(p) = %^ € R™ xn and Diag(x) (with an uppercase letter D) denotes a diagonal matrix with x on its 
diagonal. Let £(R" xfe , M nxfe ) denote the space of linear operators which map W ixk to K nxfc . The Frechet derivative 
of VE(X) is defined as the (unique) function V 2 E : E" xfe -> £(M nxfe , E" xfe ) such that 

.. \\VE{X + S)- VE(X) - V 2 E{X){S)\\ F 

hm — = 0. 

l|S|| F ->o ||5|| f 

Then next lemma shows an explicit form of the Hessian operator Rl fPH . 

Lemma 2.1 (Lemma 2.1 in B121 X Suppose that t xc (p(X)) is twice differentiable with respect to p(X). Given a 
direction S € W ixk , the Hessian-vector product of E(X) is 

V 2 E(X)[S} = H(X)S + B(X)[S], (7) 

where J — L' + a £ ^ c e and 

dp- 

B{X)[S] = 2Diag ( Jdi&g(SX T )) X. (8) 

We make the following assumptions on the total energy. 

Condition 2.2. The second-order derivatives of the exchange correlation functional e X c(p) is uniformly bounded from 
above, which implies the Lipschitz continuity of its Jacobian. Without loss of generality, we assume that there exists a 
constant a such that 

||Diag(^ xc ( j o) T e) - Diag(^ xc ( j 5) T e) || p < a\\p - p\\ 2 and 
We next consider the second part of the Hessian operator B(X)[S] defined in ((H). 



dp 2 



<<j, Vpe 
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Lemma 2.3. Suppose that Condition\2j\holds. Let X, Z g £>" x (™- fc ) and S £ W ixk . Then 

\\B(X)[S]\\ F < 2V^(||X+|| 2 + a)-||5|| 2 , (9) 
\\Z T B(X)[ZZ T S]\\ F < 2^(11^112 + a) ■ \\Z T S\\ 2 . (10) 

Proof. We only prove the second inequality. Using ||Z T ||2 < 1 and ||X||2 = 1, we obtain 

\\Z T B(X)[ZZ T S]\\ F = 

< 
< 
< 
< 

where the last inequality uses the fact that ||ZM|| 2 < || M\\ 2 for any matrix M € M. kxk . This completes the proof. □ 

Our analysis also relies on the gap between the kth and (k + l)st eigenvalues of H(X). 

Condition 2.4. Let Ai < . . . < Afe < Afe+i < . . . < A„ be the eigenvalues of a symmetric matrix H € R" x ". There 
exists a gap between the kth and (k + l)st eigenvalues, that is, Xk+i — A& > 5 for some positive constant 6. 

If Condition 12.41 holds for a sequence of matrices {H 1 } (i = 1,2,...) whose S is uniformly bounded away from 
zero, {H 1 } is said to be uniformly well posed (UWP) in |[Tl [T3l . 

2.2 Distance Measurements 

The SCF iteration maintains orthogonality in each iteration. The feasible set 

nxk = { X \x e R nxk ,x T x = 1} 

is often referred to as the Stiefel manifold. The solutions of the KS equation (T]), the SCF iteration (O and the 
minimization problem are invariant with respect to orthogonal transformations. Namely, if X is a solution, all 
points in the set {XU | U G M. fexfe , U T U = Ik} are also solutions. Hence, the Euclidean distance is not suitable 
to measure the distance between a feasible point to a solution or a solution set of (Q~|i. Inspired by the convergence 
analysis in 113L we introduce two subspaces distance measurements defined in section 4.3 of for further analysis, 
i.e., for any X X ,X 2 g O nxk , 

1. Chordal 2-norm: d c2 {X 1 ,X 2 )= min \\X l Q l - X 2 Q 2 \\ 2 ; 

2. Projection 2-norm: d p2 {X u X 2 ) = \\X x Xj - X 2 Xj\\ 2 . 
Let WSV T be the singular value decomposition of Xj X 2 . It holds that 

d c2 (X 1 ,X 2 ) = \\X 1 U-X 2 V\\ 2 . (11) 

We next present the equivalence between d C 2 and d P 2, which is not discussed in Q. 



||2Z T Diag(Jdiag(ZZ T SX T ))X\\ F 
2||Z T || 2 ||Diag(Jdiag(ZZ T M T ))|| F i|X||2 
2||Diag(Jdiag(ZZ T S , X T ))|| F = 2\\Jdiag(ZZ T SX T )\\ 2 
2\\J\\ 2 ■ ||diag(ZZ T 5X T )|| 2 < 2|j J|| 2 • V^W Z Z T S X T \\ c 
2V^\\Jh ■ \\ZZ T SX T \\ 2 < 2y/E\\J\\ 2 ■ \\Z T S\\ 2l 
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Lemma 2.5. Given any X\, X 2 € O nx , the Chordal 2-nonn and Projection 2-norm satisfy 

d C 2(X u X 2 ) > d p2 (X 1 ,X 2 ) > ^-dca(*i,* 2 ). (12) 

Proof. We first consider the first inequality in (fTZt . Let us denote X\ = X\U and X 2 = X 2 V, where U and V are 
defined in ( fTTT i. Then, we observe 

r< (h- xjx 2 ) (h — xJXi) = 1 — xjx 2 - xJXi + xjx 2 xjx 1 
= (2I k - XJX 2 - xjx 1 ) - (I k - X^X 2 X 2 T X 1 ) , 

which yields 

C ma x(Jfe " Xi X 2 Xj Xi) < (Tmax(2/fc — Xj X 2 — X 2 X\). (13) 

Let Z 2 e (jnx(n-k) k e t j le orthogonal complement to X 2 . The left hand side of dot satisfies 

<r ma x(-ffc — X[X 2 X 2 Xi) = a m ^(Xj (Ik ~ X 2 X 2 )X\) = a max (Xi Z 2 Z 2 T X\) 

= \\Z 2 T X 1 \\ 2 2 = dl 2 (X 1 ,X 2 )=d 2 p2 (X u X 2 ), (14) 

where the last equality holds due to Theorem 2.6.1 of 0. It follows from (fTTl i that the right hand side of (fLTt satisfies 

a max (24 - XjX 2 - XJX X ) = - X 2 || 2 = d2a(X x , X 2 ), (15) 

which together with ( TT4T > proves the first part of (ILTt . 

We now prove the second inequality of ( fT2l . According to (fl4l and the definitions of [/ and V, we obtain 

dp2(^ii^2) = er max (/fc — X1X2X2X1) = a max (Ik — S 2 ). (16) 

It follows from ( fTst that 

d^C-Xij-Xa) = cr max (2Jfc — Xj X 2 — X 2 X\) = cr max (2/ fe - 2S). (17) 

Since Xl and are orthogonal matrices, each diagonal entry of the diagonal matrix £ is in [0, 1]. The proof is 
completed by combining (fTST l and (fTTT i together. □ 

Theorem 4.11 in iflOl and Corollary 7.2.5 in |j5] are sufficient to guarantee the convergence of the invariant sub- 
spaces corresponding to the fc-smallest eigenvalues. 

Lemma 2.6. Suppose that the symmetric matrix H € R nx ™ satisfies Condition \2.4\ Let AH S Ml™*™ be a symmetric 
perturbation to H and I,Ie M. nxk be the invariant subspaces associated with the k smallest eigenvalues of H and 
H + AH, respectively. If\\AH\\ 2 is sufficiently small, it holds that 

d p2 (X,X)<C-||A#|| 2 , (18) 

where C is a parameter only related to 8 in Condition \2.4\ 
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3 Global Convergence of the SCF Iteration 



In this section, we prove global convergence of the SCF iteration based on the reduction of the total energy functional 
between two consecutive iterates. Suppose that X 6 O nxk is an arbitrary feasible point of (@), and Y is obtained from 
the SCF iteration at X. Namely, the columns of Y are the eigenvectors associated with the k smallest eigenvectors of 
H(X). Let UT,V T be the singular value decomposition of X T Y, where U, V € O kxk . Then it follows from dTT) that 
Y = YVU T satisfies 

\\X -Y\\ 2 = d c2 (X,Y). (19) 

Since the linear eigenvalue problem is invariant with respect to the orthogonal transformation, we have E(Y) = E(Y) 
and Y is also the solution of the SCF iteration at X. For simplicity of notation, we call Y as the closest SCF iterate 
obtained from X under the Chordal 2-norm. 

The second-order Taylor expansion of E{Y) at X gives 

E(Y) = E(X) + (WE(X),Y-X) + ^(W 2 E(D t )[Y-X],Y-X), 

where D t = X + t(Y — X) for some t e (0, 1), and the Euclidean inner product (A±, A 2 ) between any real matrices 
Ai,A 2 G W ixh is defined as tr(Af A 2 ). Using the formulations of the gradient VE(X) = H(X)X and the Hessian- 
vector product Q, we obtain 

E(X)-E(Y) = -(VE(X),Y - X) - ^(W 2 E(X)[Y - X],Y - X) 

- l -(V 2 E(D t )[Y -X),Y-X)+ l -(V 2 E(X)[Y — X],Y — X) 
= ±({H(X)X,X)-{H(X)Y,Y))-R$\Y,D t )-R% ) (y,D t ), (20) 

where 

R [ x\Y,D t ) = ±((H(D t )-H(X))(Y-X),Y-X) (21) 
R%\Y,D t ) = ^(B(D t )[Y-X],Y-X). (22) 

The first term of the right hand side in ( |20l is the contribution from the SCF iteration in reducing E(X). Lemma 1 in 
|[T3l ensures the following reduction. 

Lemma 3.1. Suppose that Condition \2.4\ holds at H(X), and Y is the SCF iterate obtained from X, we have 

(H(X)X, X) - (H(X)Y, Y)>6- d£ 2 (X, Y). (23) 

We next estimate R^ (Y, D t ) and R {2) (Y, D t ) for the reduction of E(X) - E(Y). 

Lemma 3.2. Suppose that Condition \2.2\ holds. Let X be an orthogonal matrix with H(X) satisfying Condition \2.4\ 
and Y be the SCF iteration obtained from X. Then 

E{X)-E{Y) > ls-dl 2 (X,Y)-kV^(\\L% + a)-(d 2 c2 (X,Y) + dl 2 (X,Y)). (24) 

Proof. Let Y be the closest SCF iterate obtained from X under the Chordal 2-norm. Using the facts that the second 
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term of the left hand side in (f23t is invariant with respect to orthogonal transformation on Y and d P 2 (X, Y) = 
d P 2(^, Y), we obtain 

(H(X)X,X) -(H(X)Y,Y) > 5-d£ 2 (X,Y). (25) 
A short calculation shows that 

\\XX T -D t Dj\\ 2 < 2||X-A||2 < 2||y-X|| 2 . (26) 
The definition of H(X), Condition [22] and the inequality (f26t give 
\\H(D t ) - H(X)\\ F 

= ||Diag(Lt(p(X) - p(A)))||f + ||Diag(^ c (p(X)) T e) - Diag(/* xe (p(I>t)) T e)|| F 

< (||it|| a + tr )|| p (x)-p(A)|| 2 

< >/n(||Lt|| a + ( r)||diag(XX T ) - diag(AA T )llco 

< ^i(\\tf\\ 2 + a)\\XX T - D t Dj\\ 2 

< 2^(11^112 +a)\\Y-X\\ 2 , 



which further yields 



R ( x\Y,D t ) < 



l -{{H{D t )-H{X)){Y-X),Y-X) 



< l -\\H(D t ) - H{X)\\ F \\Y - X\\ 2 \\Y - X\\ F 

< ky/E(\\L% + <r)\\Y-X\\l (27) 

It follows from (0 in Lemma |2~3l that 

(B(D t )[Y - X],Y - X) < \\B(D t )[Y - X]\\ F \\Y - X\\ F 

< 2v^|| JH2II D t (Y - xf\\ 2 -k-wY-xy 

< 2kyfii(\\tf\\ 2 +CT)\\Y-X\\l 



where the last inequality is implied by \\D t \\ 2 = \\X + t(Y — X)\\ 2 < 1. Consequently, we have 

< ky/ri(\\L% + a)\\Y-X\\l (28) 



R ( x\Y,D t ) < 



\(B{D t )[Y - X],Y - X) 



Substituting ([25]l, (|27j» and (f28j into ([20]l, we obtain 

E(X)-E(Y) > ^•d^(X,y)-fc^(||Lt|| 2 + (7 )(||X-y||2 + ||X-y||l). (29) 

Finally, the inequality (|24j is proved by using ([19), d p2 (X, Y) = d p2 (X, Y) and E(Y) = E(Y). □ 

We now present our global convergence results based on the reduction of the total energy functioanl in Lemma l372l 
and the relationship between the distance measurements in Lemma l231 
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Theorem 3.3. Suppose that Condition \2.2\ holds. Let {X 1 } be a sequence generated by the SCF iteration such that 
{H (X 1 )} is uniformly well posed with a constant S. Then {X 1 } converges to a solution to the KS equation (Q3, if 

5>12ki/n(\\L%+a). (30) 

Proof. It follows from Lemma |231 and Lemma l3~2l that, for any i = 1,2, 

E(X>) - E(X* +1 ) > ^5-ky/K(\\L% + < r)jd^(X i ,X i + 1 ) 

-kV^(\\Li\\ 2 + <T)d 3 c2 (X i ,X i + 1 ) (31) 

Since X 1 and X t+1 are both orthogonal matrices, we have 

d c2 (X\X l+1 ) < \\X% + ||X i+1 || 2 = 2. (32) 

Substituting (l32l into d3TT l. we obtain 

E(X*)-E(X l+1 ) > (±6-3kV^(\\^h + a))d 2 c2 (X\X l+1 ). (33) 

By summing ( f33l > over all indices less than or equal to i, we obtain 

E(X^) < E(X°) - {^S - 3fe>/n(||Lt|| 2 + a)) ^ d 2 c2 (X\ X l+1 ). (34) 

3=0 

Since E(X l ) is bounded below, we have that E(X°) — E(X l+1 ) is less than some positive constant for all i. Hence, 
by taking limits in d34l ), we obtain 



lim d c2 {X\X' l+1 ) = 0. (35) 

Namely, {X 1 } converges. Let 



A 



X* = lim X\ (36) 



and X be consisted of the eigenvectors associated with the k smallest eigenvalues of H(X*). It follows from Lemma 
ESI that 



d p2 (X l+ \X) < C ■ \\H(X l ) - H(X*)\\ 2 . (37) 
Taking limit on both sides and using the continuity of H(X), we obtain 

0<d p2 (X*,X) = lim d p2 {X t+1 ,X) < lim C • \\H(X r ) - H(X*)\\ 2 = 0. (38) 

Namely, X* = X, which completes the proof. □ 



Theorem l3 . 3 1 guarantees the convergence of the SCF iteration to a solution of the KS equation, which is more than 
the first-order optimality conditions of (2). In fact, when the inequality (f3Qb holds, the reduction of the total energy 
d33l implies that any global minimizer of (0]i is a solution of the KS equation. 



4 Local Convergence of the SCF Iteration 



In this section, we establish local convergence of the SCF iteration by exposing the relationship between two consecu- 
tive iterates in terms of their distances to a particular solution of fl}. The results are called local analysis since it relies 
on the Taylor expansion in a small neighborhood of that optimal solution. 

Lemma 4.1. Suppose that Conditions \2.2\ holds. Let X* be a solution to the KS equation <[TJ whose H(X*) satisfies 
Condition \2.4\ X G Q n * k \j e [ n a sufficiently small neighborhood of X* , and Y be the SCF iterate obtained from X. 
Then d p 2(^*, Y) is of the same order ofd p2 (X*~, X), namely 

d p2 {X\Y) = 0(d p2 (X*,X)). (39) 

Proof. Using the continuity of H(X), the fact that X is in a sufficiently small neighborhood of X* and Lemma [2761 
we obtain 



d p2 pr,r) < C ■ \\H{X) - H(X*)\\ 2 = 0(\\X - X*\\ 2 ), (40) 

which proves (1391 1. □ 

Theorem 4.2. Suppose that Conditions \2.2\ holds. Let X* be a solution to the KS equation (Q3 whose H(X*) satisfies 
Condition \2.4\ X be in a sufficient small neighborhood of X* , and Y be the SCF iterate obtained from X. Then 

d p2 (X*,Y) < ^(W^h + a) . dp2{x * x) + (d 2 p2 (X*,X)). (41) 

Proof. For convenience of exposition, we introduce AX := X* — X and AY :~ X* — Y. Recalling the fact that 
V-Epf) = H(X)X, we obtain the first-order Taylor expansion of \7E(X*) at X as follows, 



H(X*)X* = VE{X*) = VE{X) + V 2 E{X)[AX] +0(||AX" 2 



2) 

2 i 



= H(X)X + H(X)AX + B(X)[AX] + 0(\\AX\\i) 
= H{X)Y + H{X)AY + B{X)[AX]+0{\\AX\\l). (42) 

Using Lemma l4~T1 and substituting X* by Y + AY, we have 

X*(X*) T H(X*)X* = (Y + AY)(Y + AY) T (H(X)Y + H(X)AY + B(X)[AX] + 0{\\AX\\D) 
= YY T H{X)Y + YAY T H(X)Y + AYY T H(X)Y 

+YY T H(X)AY + YY T B{X)[AX] + 0(||AX||i). (43) 

By using the fact that X* is a global solution of ([TJi and Y is an SCF iterate obtained from X, we have 

H{X*)X* = X*{X*) T H{X*)X\ (44) 
H{X)Y = YY T H(X)Y. (45) 

Combining (02j, (|43j, (|44]i and (|45j, we obtain 

H(X)AY - (YAY T H(X)Y + AYY T H(X)Y + YY T H(X)AY) 
= -(I -YY T )B(X)[AX]+0(\\AX\\ 2 2 ). (46) 
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It follows from Lemma HTTI that 

H(X*)AY - (X*AY T H(X)Y + AY(X*) T H(X*)X* + X*Y T H(X)AY) 
= -(I - X*(X*) T )B(X)[AX] +0(\\AX\\Z). (47) 

Let Z* be the orthogonal complement to X*. Multiplying both sides of d47T > with (Z*) T yields: 

(Z*) T H(X*)AY - (Z*) T (X*AY T H(X)Y + AY(X*) T H(X*)X* + X*Y T H(X)AY) 
= -{Z*) T B(X)[AX} + (Z*) T X*(X*) T B{X)[AX} + 0{\\AX\\1), (48) 

which can be rewritten as 

(Z*) T H(X*)AY - (Z*) T AY(X*) T H(X*)X* = -(Z*) T B{X)[AX] + 0(||AX||^). (49) 

Let Aj. and A„_j. be the diagonal matrices consisting of the k smallest and n — k largest eigenvalues of H(X*), 
respectively. It follows from (l44l and the definition of Z* that 

A„_ fc (Z*) T Ay - (Z*) T AYA k = -(Z*) T B(X)[(Z*(Z*) T + X*(X*) T )AX] + 0(\\AX\\ 2 2 ). (50) 

By using the orthogonality of X, we have (X* - AX) T (X* - AX) = X T X = I, which further gives, 

(X*) T AX = 0{\\AX\\ 2 ). (51) 

It follows from dBTT l that 

A„_ fe (Z*) T Ar - (Z*) T AYA k = -(Z*) T B(X)[Z*(Z*) T AX] + 0(||AX|||). (52) 
Taking Frobenius-norm on both sides of (l52t , we have 

||A n _ fc (Z*) T Ay|| F -||(Z*) T AyA fc || F < \\(Z*) T B(X)[Z*(Z*) T AX]\\ F + 0(\\AX\\ 2 2 ). (53) 
Condition l2.4l implies 

||A„_ fc (Z*) T Ar|| F - ||(Z*) T ArA fe || F > S\\(Z*fAY\\ F . (54) 
By using Lemma l2~3l and substituting (154-b into 03] ). we obtain 

<5!|(Z*) T Arj| F < 2V^||J|| 2 ■ ||(Z*) T AA|| 2 + 0(\\AXf 2 ). (55) 

Itis clear that d p2 (X*,Y) = ||(Z*) T Arj| 2 < ||(Z*) T Ay|| F andd p2 (A*,A) = || (Z*) T AX\\ 2 . Recalling (HQ) and 
the definition of Z* , we obtain 

||AA|| 2 > ||(Z*) T AA|| 2 > ||AA|| 2 - ||(A*) T AA|| 2 = ||AA|| 2 - 0{\\AX\\\). (56) 

Namely, 0(\\ AX\\ 2 ) = 0(d p2 (X* , X)) holds, which completes the proof. □ 
Hence, when 2 v / n(||Z^|| 2 + a) < 5 holds, Theorem 14.21 implies that the SCF iteration converges linearly to the 
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solution X* of the KS equation once the sequence locates in a sufficiently small neighborhood of X*. 



5 Comparison with the Results of Yang et al. in Ifl3ll 

In this section, we explain the difference between our convergence results and these of Yang et al. iTPTl on a special 
form of the total energy functional as 

E(X) 4 ±tr(X T LX) + ^(XfL-'piX), 

whose Hamiltonian is 

H(X) = L + aDiag(L-Vpf))- 

Since there is no exchange correlation energy functional in this case, the constant a = in Condition [272] 
Theorem B .3 I provides global convergence from any initial point if 

a < ° G ~ 12k J\\L-^\\ 2 (5?) 

According to Theorem 14. 2 1 the SCF iteration converges linearly to the optimal solution from an initial point located in 
a neighborhood of that solution, if a satisfies 

& 

2y/n\\L ^la 

On the other hand, Yang et al. |TT3l proves convergence of a variant of the SCF iteration whose the density function 
is computed by 

p = diag(/ M (F)), 

where /„(*) = 1+ J {t -^ and f^H) = VDiag(/ M (Ai), . . . , U(X n ))V T , where H = VDiag(Ai, . . . , X n )V T is the 
eigenvalue decomposition of H. They provide global linear convergence if 

2 

ot < cap = , (59) 

n I3\\L L \\i 

where j3 and p satisfy 

traced (if)) = k. 
For a given constant 7 <C 1, the smoothing can be achieved by requiring 

1 > 1 — ry 

<7, 



which is equivalent to 



Notice that 



/3 > max ■ 



In 1=2 l n 1^2 

7 7 



p — \k Afc + l — /! 



mm max 



7 1 = 4 • In 



n \ p - \k A fc+ i - p I <5 7 
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whose minimum is achieved at /x 



" 2 h+1 , Therefore, we obtain j3 > | • In ^— c . Namely, 



<5 



(60) 



hi 



We notice that ky/n < n 1 - 5 < n 4 and ky/n <C n 4 when n is sufficiently large. Moreover, In — ^ > 12 if 
7 < 6.1442 x 10~ 6 , whereas In • n 4 > I2k^/n, when 7 < 0.1070 and n > 2. By comparing ([60} to (f5Tb . we 
can obtain that a.F < og under a reasonable value of 7. Furthermore, ctf < olq holds when n is sufficiently large. 
Hence, we can conclude that our condition is no more restricted than the one in ff3l . 

6 Conclusion 

We study the convergence issues of the well-known self-consistent field (SCF) iteration for solving the Kohn-Sham 
equation in density functional theory. Our analysis is based on the second-order Taylor expansion of the total energy 
functional. We show that a "complicate" part of the Hessian plays a important role in ensuring the convergence of 
the SCF iteration. Both global and local convergence can be guaranteed if the gap between the fcth and (k + l)th 
eigenvalues of the Hamiltonian H(X) outweighs the norm of the complicate term in the Hessian up to some constant. 
Our analysis only requires the assumptions that the second-order derivatives of the exchange correlation energy is 
uniformly bounded from above. 

Although our conditions seem to be restrictive for the convergence of the SCF iteration, it still provides us some 
insights on the performance of the algorithm. Recently, numerical evidences show that the exact Hessian can speed 
up the convergence of the SCF iteration in the trust-region framework ||l2l . However, our analysis hasn't covered the 
acceleration scheme using charge mixing since it is a fixed-point algorithm in terms of the charge density rather than 
minimizing the total energy functional. 
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